Detection of child exploiting chats from a mixed chat dataset as a text classification task

نویسندگان

  • Md. Waliur Rahman Miah
  • John Yearwood
  • Siddhivinayak Kulkarni
چکیده

Detection of child exploitation in Internet chatting is an important issue for the protection of children from prospective online paedophiles. This paper investigates the effectiveness of text classifiers to identify Child Exploitation (CE) in chatting. As the chatting occurs among two or more users by typing texts, the text of chat-messages can be used as the data to be analysed by text classifiers. Therefore the problem of identification of CE chats can be framed as the problem of text classification by categorizing the chatlogs into predefined CE types. Along with three traditional text categorizing techniques a new approach has been made to accomplish the task. Psychometric and categorical information by LIWC (Linguistic Inquiry and Word Count) has been used and improvement of performance in some classifier has been found. For the experiments of current research the chat logs are collected from various websites open to public. Classification-viaRegression, J-48-Decision-Tree and NaïveBayes classifiers are used. Comparison of the performance of the classifiers is shown in the result.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploiting Associations between Class Labels in Multi-label Classification

Multi-label classification has many applications in the text categorization, biology and medical diagnosis, in which multiple class labels can be assigned to each training instance simultaneously. As it is often the case that there are relationships between the labels, extracting the existing relationships between the labels and taking advantage of them during the training or prediction phases ...

متن کامل

Classifying Dialogue Acts in Multi-party Live Chats

We consider the task of classifying chat contributions by dialogue act in a multi-party setting. This extends the problem significantly over the 1-1 chat scenario due to the semiasynchronous and “entangled” nature of the contributions by chat participants. We experiment with a number of machine learning approaches, using different categories of features: lexical, contextual, structural, keyword...

متن کامل

(Dis)agreements in Iranians’ Internet Relay Chats

The present study on politeness is an attempt to examine (dis)agreeing strategies utilized by EFL learners while chatting on the internet. Subjects of the study were forty male and thirty-three female Iranian natives whose internet relay chat (IRC) interactions, composed of 400 excerpts, were collected between December 2007 and September 2008. Data analysis was based on the general taxonomy of ...

متن کامل

A Multi-Label Classification Approach for Coding Cancer Information Service Chat Transcripts

National Cancer Institute's (NCI) Cancer Information Service (CIS) offers online instant messaging based information service called LiveHelp to patients, family members, friends, and other cancer information consumers. A cancer information specialist (IS) 'chats' with a consumer and provides information on a variety of topics including clinical trials. After a LiveHelp chat session is finished,...

متن کامل

Learning Document Image Features With SqueezeNet Convolutional Neural Network

The classification of various document images is considered an important step towards building a modern digital library or office automation system. Convolutional Neural Network (CNN) classifiers trained with backpropagation are considered to be the current state of the art model for this task. However, there are two major drawbacks for these classifiers: the huge computational power demand for...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011